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Abstract —Recently we extended Approximate message passing 
(AMP) algorithm to be able to handle general invariant matrix 
ensembles. In this contribution we extend our S-AMP approach 
to non-linear observation models. We obtain generalized AMP 
(GAMP) algorithm as the special case when the measurement 
matrix has zero-mean iid Gaussian entries. Our derivation 
is based upon 1) deriving expectation propagation (EP) like 
algorithms from the stationary-points equations of the Gibbs 
free energy under first- and second-moment constraints and 2) 
applying additive free convolution in free probability theory to 
get low-complexity updates for the second moment quantities. 

Index Terms —Approximate Message Passing, Variational In¬ 
ference, Expectation Propagation, Free Probability 

I. Introduction 

Approximate message passing techniques, e.g. 0-G), have 
recently received significant attention by the signal processing 
community. Essentially, these methods are based on taking 
the large system limit of loopy belief propagation where the 
central limit theorem can be applied when the underlying 
measurement matrix has independent and zero-mean entries. 

Variational inference techniques are well-established in the 
field of information theory e.g. m, a and machine learning 
e.g. @, 0. For example, it is well-known that exact inference 
can be formulated as the solution to a minimization problem 
of the Gibbs free energy of the underlying probabilistic 
model under certain marginalization consistency constraints 
a. We have recently shown in a that for the zero-mean 
independent identically distributed (iid) measurement matrix, 
approximate message passing (AMP) algorithm JT) can also be 
obtained from the stationary-points equations of the Gibbs en¬ 
ergy under first- and second-moment consistency constraints. 
Furthermore, AMP can be extended to general invariant 
matrix ensembles by means of the asymptotic spectrum of the 
measurement matrix. We call this approach S-AMP (where S 
comes from the fact that the derivation uses the S-transform). 

AMP is an estimation algorithm for the linear observation 
models. However many interesting cases occur in practice 
where the observation model is non-linear, e.g non-linear form 
of compressed sensing, Gaussian processes for classification. 
In this article we extend S-AMP approach m to general 
observation models. Specifically we address the sum-product 
generalized AMP (GAMP for short) algorithm 0. 

1 Note that we omit to mention the invariance property in (8). It is however 
crucial for the derivation. 


The derivation of GAMP is based on certain approximations 
(mainly Gaussian and quadratic approximations) of loopy 
belief propagation. If the measurement matrix is large and 
has zero mean and iid entries, GAMP provides excellent 
performance, e.g. a, a. Furthermore, for general matrix en¬ 
sembles it can show quite reasonable accuracy Col- However 
the algorithm itself and its derivation are not well-understood. 

To better understand GAMP, in DU the authors characterize 
its fixed points. Specifically, they show that GAMP can be 
obtained from the stationary-point equations of some im¬ 
plicit approximations of naive mean-field approximation CD- 
These implicit approximations only provide limited insight. 
Furthermore, the naive mean-field interpretation is misleading, 
because the fixed points of AMP-type algorithms are typically 
known as the TAP-like equations, i.e. they include a correction 
term to naive mean-field solution. In fact GAMP can also be 
obtained from the stationary-points equations of the Bethe free 
energy (BFE) of the underlying loopy graph under first- and 
second-moment constraints. However, this approach also limits 
our understanding, because the BFE formulation of a loopy 
graph is suitable for sparsely connected systems. 

In this work we focus on the BFE formulation of a tree 
graph, i.e. an exact Gibbs free energy formulation. We note 
that our approach coincides with the expectation prorogation 
(ep) m-m since the fixed points of EP are the stationary 
points of BFE of the underlying probabilistic graph under a 
set of moment consistency constraints 0. 

Notations: The entries of the N x K matrix X are denoted 
by either X n k or [X] n k, n£AT = {n:l<n< N} 
and k £ 1C = {k : 1 < k < K}. Without loss of 
generality we assume that /C and TV" are disjoint. (•) denotes 
the transposition. We denote by 'Rz and r <sz the real and 
imaginary part of z £ C, respectively. The entries of a vector 
u £ R Tx1 are indicated by either it t or [it] t , t £ [1,T]. 
Furthermore (it) = Y^it=i U t/T. Moreover, diag(it) is a 
diagonal matrix with the elements of vector u on the main 
diagonal. For a square matrix X, diag(X) is a column 
vector containing the diagonal elements of X. Furthermore 
Diag(X) = diag(diag(X)). The Gaussian probability density 
function (pdf) with mean /r, and the covariance S is denoted 
by iV(-; //, S). Throughout the paper when referring to “in the 
large system limit” we imply that N, K tends to infinity with 
the ratio a = N/K fixed. All large system limits are assumed 
to hold in the almost sure sense, unless explicitly stated. 


II. System Model and Review of GAMP 
Consider the estimation of a random vector x £ R Xxl 
which is linearly transformed by A £ E ,v x K as 2 = Ax, 
then passed through a noisy channel whose output is given by 
y £ E ;V x 1 . We assume that the conditional pdf of the channel 
factorizes according to 


p(y\z) = p(y™\ z ")- 

(1) 

neAf 


Furthermore for the Bayesian setting we assign a 
x that is assumed to be factorized as 

prior pdf for 

P(x) = p{x k ). 

(2) 

k£lC 


A. GAMP summarized 


We summarize GAMP here for the sake of streamlining 

and making the connection to the derivation of S-AMP. We 
separate the GAMP iteration rules (3 into two parts: (i) 

GAMP-1st order that initializes x , r* and rn* 
rasa at t < 0 and proceeds iteratively as 

from tabula 

< = Ax* - (V‘) _1 m i_1 

(3) 

z = /i z («*;V‘) 

(4) 

r* = ct z (k*;V*) 

(5) 

m t =Wi(z t -Kl) 

(6) 

= (Vl)- 1 A^ m* + x 

(7) 

x t+1 =^;\l) 

(8) 

T x +1 = o-x(« x ;V‘). 

(9) 

(ii) GAMP-2nd order are the update rules for V* 

j and V x : 

Vz = (diag((AoAK))" 1 

(10) 

rl = V*(l - V z r‘) 

(11) 

V x = diag((AoA) t r[ n ). 

(12) 


In these expressions 1 is the all-ones vector of appropriate 
dimension and /r x and cr x are scalar functions. Specifically, if 
V is a K x K diagonal matrix and k is a K x 1 vector; then 
for k £ 1C, [/r x (/«;V)]fe and [ct x (ac; V)]fe are respectively the 
mean and the variance taken over the pdf 

q k (x k ) oc pk(x k ) exp x k - nk) 2 ^j • (13) 

Similarly, // z and <r z are scalar functions such that if V is a 
N x N diagonal matrix and k is a IV x 1 vector, for n £ J\f, 
[/r z (rt; V)]„ and [ct z (k; V)]„ are respectively the mean and the 
variance taken over the pdf 

q n {z n ) oc p{y n \z n )exp z n - K n ) 2 j . (14) 

If the entries of A are iid with zero mean and variance 1 /N, 
the iteration steps for the GAMP-2nd order simplify as 

Vz = pyl, = (T* m )l, (15) 

where I is the identity matrix of appropriate dimension. We 
note that if in addition p(y\z) = N(y; z, er 2 I), GAMP yields 
AMP, see e.g. j2j Appendix C]. 


III. Gibbs Free Energy with Moment Constraints 

For the sake of notational compactness, consider s = ( x,z ). 
Furthermore we introduce the set V = ICUAf and assume that 
1C and Af are disjoint. Moreover we define 


f A {s) A 

S(z — Ax) 


(16) 

fv{Sy) = 

J Pv (Xy ) 
\p(yv\z v ) 

v £ 1C 

V £ M. 

(17) 


With these definitions, the posterior pdf of s reads 

p(s\y, A) = ^ f A (a ) f v {s v ). (18) 

v£V 

with Z denoting a normalization constant. The factor graph 
representing (flSl is a tree. Thus the BFE of (ITSl) is equal to 
its Gibbs free energy |[4l : 

b A , by }) — ^ ' I b v (s v ) log b v ( 5 ^ 

uev' 

- f b A (s) log [ b v( s v) log { V [ Sv \ ( 

J b A (s) ^J b v {s v ) 

(19) 

Here b A and b v , v £ V, denote the beliefs of the factors in 
(fl8l) . while b v , v £ V, denote the beliefs of the unknown 
variables in (fl8l >. Without loss of generality we assume that 
the expressions f A (s)/b A {s) and fy(s v )/b v (s v ) in ( fl9l > are 
strictly continuous; so that the Gibbs free energy is well- 
defined. Indeed this is what we will end up with in the analysis. 

If we define a Fagrangian for ( fl9t that accounts for certain 
marginalization consistency constrains, then at its stationary 
point, the belief b v (s v ) is equal to p(s v \y,A) for all v £ V 
0. Instead, following the arguments of E), we define the 
Fagrangian on the basis of a set of moment consistency 
constraints as 


C({b v ,b A ,b v })±G({b v ,b A ,b v }) + Z 
J 0( S „) {b A (s) -6„(s„)jds 


v&V 


^ ' zxl l (fi(s v ) ds^. (20) 

v&V ^ 

Here we consider constraints on the mean and variance, i.e. 
(j){s v ) = (s„,s 2 ), v £ V. For convenience we write the 
Fagrangian multipliers as 


ISy - I ~fy 


A. 


I'v = \ P 


V„ 


, v £ V. 


The term Z accounts for the normalization constraints: 

Z±-Pa(i- J b A {s)ds^j 

^ by {^OCy (3y ^1 ^ by^Sy'jd.Sy^ 


where /3 A , /3 V , /3 V are the associated Fagrange multipliers. 





We formulate the estimate of s v ,v £ V, as 


( 21 ) 


Thereby ( |3II ) has a form identical to ( fl3l > and (IT4l) for v £ Af 
and for v £ /C, respectively. Then let us define 



where b*(s v ) represents b v (s v ) at a stationary point of ( l20l l. 

A. The Stationary Points of the Lagrangian 

For notational convenience we introduce first the (K+N) x 
(K+N) diagonal matrices A and V as well as the (K+N) x 1 
vectors 7 and p whose entries are respectively A vv , V vv , 7,. 
and p v , v £ V. In connection with variables x and 2 we write 

A =(t A,)' 7= (7^,7.) (22) 

V=( ^ °), P = (P„P.)- (23) 

The dimensions of A x and V x are K x K ; vectors 7 X and p y 
have dimension K x 1. 

Following the arguments of 0, we have the stationary 
points of the Lagrangian (l20l > in the form 

b*(s v ) = 7 T ex p((i y 3i + O v ) ] (t>(s v )), v £ V (24) 
Zj v 

b*(s v ) = -l-f v (s v )exp(Dl0(s v )), v £ V (25) 

Zj v 

&a(s) = ^/vi(s) exp (-^s t A.s + s t 7) (26) 

where Za, Z v , Z v are the associated normalization constants. 

Let us first consider the marginalization of the belief b\(s) 
with respect to z\ 

b* A (x) = J b* A (x,z)dz = N(x;x,T, x ) (27) 

where 

S x 4(A x + A t A z A)" 1 , x^S x ( 7x + A t 7z ). (28) 


m(k; v) = (/Lt x (K x ; Vx), Atz(K z ; v z )) (32) 

ct(k; V) = (£7 x (k x ; v x ), ct z (k z ; V z )) . (33) 

The entries [p(n; V)]„ and [<t(k; V)]„ are respectively the 
mean and variance of the belief (1251) . Moreover we introduce 

s -( E o" s°, )• * = (34) 

With these definitions, the identities resulting from the moment 
consistency constraints are given by 

s = Diag (£)(7 + p), s = p(n;V) (35) 

Diag(S) = (A + V)- 1 , diag(S) = <j(«; V). (36) 

B. The TAP-like Equations and GAMP-1st Order 

By using the fixed-point identities presented in Sec- 
tion lHI-Al one can introduce numerous fixed-point algorithms. 
In this work we restrict our attention to TAP-like algorithms, 
e.g. 0, CD, M- To that end we start with the definitions 


in (1281) and write 

7 X = --4 f 7z + ( A x + A J A z A)x. (37) 

Furthermore by making use of the identities in ( l35l ) and < [36b 
we have 

p x = A f 7 z - (A x + A ] A z A)x + (A x + N x )x (38) 
= A\~i z ~A z Ax)+Y^x (39) 

= A^m + V x £ with m = ( / y z — A z Ax). (40) 

Moreover, by the definition of m we also point out that 

m = (A z + V z )£ ~ p z - A z Ax (41) 

= V z z- p z = V z (z- k z ). (42) 


Here we note that S x is positive definite since b* A (x) is a 
well-defined pdf. Second, let us consider the marginalization 
over x, which basically follows from the linear transformation 
property of a Gaussian random vector: 


b* A ( z ) = 


s -iz t A z z+z t 7 a 

Za 


< 5(2 - Ax)e~^ xtAxX+xi ^dx 


J S(z — Ax)N(x; x 7 S)da: = N(z; z, S z ) (29) 


where z = Ax and S z = A11 X A^. 

At this stage it is convenient to define 

K = (k x ,k z ) = (VxVx, YjVz) (30) 

with k, x £ R k . In this way we can write the belief in (l25l > as 

b*(s v ) oc fv(s v ) exp (~(s v ~ ■ (31) 


Thereby we exactly obtain the fixed points of GAMP-1st order, 
i.e. ©-©. Now let us keep the iterations step of GAMP-1st 
order but define the update rule for A x and A z on the basis 
of the fixed point identities in (l36l) . For example: 


A z = (diag(r‘- 1 ))- 1 - V*" 1 

(43) 

= (A*" 1 - ^A*A)" 1 . 

(44) 

V‘=Diag (AS‘At) _1 -A* 

(45) 

Ax = diag(r* )“ 1 - V^" 1 

(46) 

Vx = Diag(S*)- 1 -A*. 

(47) 


In this way we obtain a new fixed point algorithm whose 
fixed points are the stationary point of Lagrangian ( l20l ). 
However from the complexity point of view these updates are 
problematic due to the matrix inversion in (l44t . In the sequel 
we will address how to bypass (l44l ) as K. N are large. 





C. The Large-System Simplifications 

To circumvent the complexity problem (l44t . we utilize 
the so-called additive free convolution in free probability 
theory m. The reduction that we obtain in this way can 
be also obtained by means of the self-averaging ansatz in lfl4l 
Section 3.1]. 

In order to make use of additive free convolution we need 
to restrict our consideration to the invariant matrix ensembles: 

ASSUMPTION 1 Consider the singular value decomposition 
A = UDV where jj NxN anc j y K xK are orthogonal 
matrices and D is a N X K non-negative diagonal matrix. 
We distinguish between the invariance assumption on A from 
right and from left: a) A is invariant from right, i.e. V is 
Haar distributed; b) A is invariant from left, i.e. U is Haar 
distributed. 


Then by invoking d48l > we easily obtain that (see Appendix iBl) 

q ^K^[A x]kk +nl(-q)- (51) 

Thereby, we conclude that 

[V x ]fcfc — TZj z (—q), k£lC. (52) 

The average of ( l52l ) over the random matrix A agrees with Ifl4l 
Eq. (50)]. Note that the simplification in (l52l) is still implicit 
due to the definition of q in ( 15 1 1 ). Subsequently we present 
an explicit complexity simplification for [V x ]fcfc. First we note 
that (l52l) states that we can replace all the elements [V x ]fcfc, 
k £ K. by a single scalar quantity, say V x - This allows us to 
write q ~ (<t x (k: x , V X I)} with k x = V^A^m + x. Then, 
from (l52b we write an explicit fixed point identity for V x as 

V x = (<t x (k x ; V X I)». (53) 


It indeed makes sense to distinguish between the invariance 
from right and the invariance from left. For example, once 
we consider the classical linear observation model such as 
p(y\z) = N(y; z, <r 2 I), then A z = I/a 2 . In this case we do 
not need to consider Assumption 1-b). 

Second, we make the following technical assumption on the 
limiting spectrum of the respective matrices: 

Assumption 2 As N, K —> oo with the ratio a = N/K 
fixed let the spectra of A x , A z and A^A converge almost 
surely to some limiting spectra whose supports are compact. 


Due to lack of an explicit definition of the “Lagrangian” matrix 
A, Assumption 2 is rather implicit. Nevertheless it can be 
considered in the same vein as the so-called thermodynamic 
limit in statistical physics: all microscopic variables converge 
to deterministic values in the thermodynamic limit (161 . 

For example, under Assumption 1-a) and Assumption 2, it 
turns out that A x and J z = A' A Z A are asymptotically free 
G3 and from |T5] Lemma 3.3.4] we have tha0 

R* +j >)~R£>)+R£H, 3cc<0. (48) 

Here for a T x T symmetric matrix X lij denotes the R- 
transform of the spectrum of X (see Appendix [A} and ~ 
stands for the large system approximation that turns to an 
almost surely equality in the large system limit. Furthermore 
we introduce 


n T x (r) = lim (RR^-(w), Sr = 0 (49) 

cj—fr 


whenever the limit exists. 

It turns out that by solely invoking “additive free convolu¬ 
tion”, e.g. ©, we can easily solve the complexity issue of 
the fixed point identities for V x and V z which do not require 
matrix inversion. First we consider the simplification for V x . 
To than end let us first define the auxiliary variable 


g^-tr{(A x +J z ) 


A [A x ]/c/c A [V x ]fcfc 


(50) 


-In fact we can define the R-transform on negative real line. However in the 
exposition it requires an implicit assumption that A is being positive-definite. 


As a second part we address similar complexity simplifica¬ 
tion for [V z ]rm for n £ AT. To that end let us introduce an 
auxiliary N x 1 vector r m whose entries are defined as 

[%]n = [A z ] n „ — [A z ]^ n [A(A x + A^A Z A) 1 A^] nn (54) 
= [(A-' + AA- 1 ^)- 1 ]™, (55) 

where (l55l > follows directly from Woodbury’s matrix inversion 
lemma. Furthermore by making use of (l36l) for (l54l) we can 
write the following fixed-point identity 


[(2V Z 1 + AA X 1 A/) 1 ] nn — [A z ] nri — 


[A Z ]L 


[A z 


[Vz 


1 


(56) 

(57) 


[A z 1 }nn + [V z 1 ]nn 

Thus, we can invoke identical arguments on the additive 
free convolution approximation above for [V z ] nn as well. 
Specifically, under Assumption 1-b) and Assumption 2, for 
a large N, K we have 

[Vz]rm - -pAT A ’ n £ JV (58) 

'v x l vm/j 

with J x = AA X 1 A '. The complexity simplification (l58l) is 
still implicit due the definition of r m . To present an explicit 
form of it consider first (l56l > and (l57l > such that we can write 


[frnjn — [V z ] nn 


[Vz]L 


(59) 


[V z ]nn A [V z ]nn 

= [v z ] nn (1 - [v z ] nri [cr z (/« z ; V z )]„) ■ ( 60 ) 


On the other hand, (158b implies that we can replace all the 
elements [V z ] n „, n £ Af by a single scalar quantity, say V z . 
Now for convenience let us define N x 1 vector t 1m whose 
entries are given by 

[fm]n — Vz (1 — V Z [<T Z (« Z ; V Z I)]„), n£M. (61) 
we introduce an explicit fixed-point 


Then following 
identity for V z as 


V z = 


1 


n N j(-{r m )y 


(62) 









So far we have shown in ( |53| > and ( l62t how to bypass the 
need for matrix inversion to “update” A x and A z , respectively. 
However this treatment require solving a highly non-trivial 
random matrix problem i.e. deriving the closed form solution 
for IZj and IZj . This is usually, though not always, not 
possible. On the other hand deriving the solution of e.g 1Z 1 } 
in the limiting case, denote 1Zj z , is rather simpler. Due to 
the uniform convergence property of the R-transform Q21 
Lemma 3.3.4], this approach would allow us to accurately 
predict, for example 7 Z 1 } , for large N, I\. This is what we 
show in the next subsection for the zero mean iid Gaussian 
matrix ensemble. 

Example: The zero-mean and iid case, i.e. GAMP: In this 
section we provide the explicit solutions for V x and V z when 
the entries of A are assumed to be iid Gaussian with zero 
mean and variance 1/A. 

From the well-known Marchenko-Pastur theorem we obtain 
that (see Appendix P 


R&M 


1 y_1 

N A' TAT 1 !,,,, 


E 


[-A-z \nn ’jJ j Oi 

1 


aK k&C AAfc w 


(63) 

(64) 


Then we obtain the following expression for V x and V z as 


v x ~ — y —— - 

N hx t Az 1™ + (^xiyvxiya 

1 ^ 1 v 1 

V z aK [A x ]kk + (r m )' 


(65) 

( 66 ) 


From these equations one can conclude that 


V z ~ 


a 


<(k x ; V X I)) : 


V x ~ (T m ). 


(67) 


Thus we recover the fixed point of the GAMP-2nd order 
updates for the zero-mean iid matrix ensemble as in ( | 1 5[ i. 


IV. Conclusion 

For the given zero-mean iid Gaussian matrix ensemble, 
the fixed points of GAMP “asymptotically” coincide with the 
stationary points of Gibbs free energy under first- and second- 
moment constraints. It turns out that the only critical issue 
for GAMP is the update rules for ’’variance” parameters V x 
and V z . These parameters play a central role. Specifically a 
crude update rule for a given measurement matrix ensemble 
would completely spoil the optimality of the algorithm. If 
for general invariant matrix ensembles, V x and V z can be 
updated based on the R-transform formulation in ( |53| > and 
(|62| >; the algorithm “asymptotically” fulfills the stationary 
points identities of Gibbs free energy formulation. Once the 
closed form expressions of (l53l > and (l62l > are obtained, the 
resulting algorithm includes solely O(N) operations. But the 
computation of the solutions to these identities is not trivial. 
Nevertheless it is sometimes doable, e.g. the random row 
orthogonal matrix ensembles. Furthermore once either the 
prior or the likelihood is expressed in terms of a Gaussian 


function, the R-transform formulation becomes rather trivial. 
In general updating V x and V z requires a matrix inversion 
at each iteration, e.g. see ©-lO. An alternative, but sub- 
optimal, method would be the Swept-AMP algorithm flOl that 
is based on the GAMP methodology and includes 0(N 2 ) 
operations. 
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Appendix A 
Preliminaries 

Let Px a probability distribution on real line. We denote 
the Stieltjest transform of Px as 

G x (s) 4 [ dP * (x) , 3s > 0 (68) 

J x — s 

where 3Gx(s) > 0 lH8l . 

The R-transform of Px is defined as ED 

Rx(w) = G^(-w) - 3w< 0 (69) 

i0 

with G^ 1 denoting the inverse of Gx- Equivalently, 

R*(-Gx(s)) = s + —Vr ( 7 °) 

G x{s) 

Here we draw the attention of the reader that 3Rx(w) < 0, for 
3w < 0 unless Px is a Dirac distribution. This fact follows 
from the following property of the Stieltjest transform fl~8l 
Proposition 2.2]: for 3s > 0, 3{ + s} < 0 where the 

equality holds if, and only if, Px is a Dirac distribution. 

REMARK 1 Let P x have a pdf px- Furthermore let px (0) = 
0; so that lim e _ >0 + 3Gx(ie) = 0. Moreover let 

q= lim G x (je) = [ x~ 1 dP x {x) < oo. (71) 

e->o+ J 

Then we have the following identity 

- = lim Rx(-w). (72) 

q 

Consider an T x T symmetric matrix X. Let C be the set 
containing the eigenvalues of X. The spectrum of X is 
denoted by 

Px(z) = ^|{Ae£:A<x}|. (73) 

We denote the Stieltjest transform and the R-transform of P^- 
by Cf x and R^, respectively. Furthermore if for T —» oo, X 
has a limiting spectrum almost surely it is denoted by Px- 
Moreover, the Stieltjest transform and the R-transform of Px 
are denoted by Gx and Rx, respectively. 

Appendix B 
Proof of ( |5TT > 

Note that from definition in (l50l > we have 

q = J x~ 1 dP Xx+J ^{x). (74) 

By invoking Remark 1 and (l48l > (under the Assumption 1-a) 
and Assumption 2), successively we can write 


Here, without loss of generality, we can define q e = q + e. On 
the other hand, from the definition of the R-transform in ll7Ql i 
we have 

Ri(-Gi(-s))+s- rK ) =0 3s < 0. (76) 

Hence we can write 

q £ +je~G% x {-R% z {-q £ -je)) (77) 

K S [ Ax l fefe + ~ je)' 

This completes the proof. 


Appendix C 
Proof of d63]i & (f64l) 

Let us first consider J z = A' K V A. Note that we do not 
assume that A and A z are independent. On the other hand. 
Assumption 2 results in that A^A and A z are asymptotically 
free of each others. In this way we can find Rj z by means of 
the so-called multiplicative free convolution |fl9l . However this 
requires the reader to be familiar with the S-transform in free 
probability. In fact, by invoking standard random matrix results 
we can bypass the need for using the S-transform. Specifically, 
from the well-known Marchenko-Pastur theorem, we can write 




1 


s + S 


d p A, (x) 

1/x+G j z ( s)/a 


(79) 


The result (f79t is proven under the assumptions that the entries 
of A are iid (not necessarily Gaussian) with zero mean and 
A is independent of A z |20l. Due to the asymptotic freeness, 
this result holds when the entries are restricted to Gaussian 
but without restriction that A and A z are independent. Now 
by letting s = GJ 1 (— u>) in (f79t and from the definition of 
the R-transform in (f69l> we have 


Rj» 


dPx z (x) 
1/x — w/a 


(80) 


Furthermore following the identical arguments for J x we find 




i 

a 


dPA x (aQ 
x — OJ 


(81) 


Due to US Lemma 3.3.4], the right hand side of the 
expressions in (1631) and (l64l > converge uniformly to (l80l) and 
respectively. This completes the proof. 












